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Several cosmological measurements have attained significant levels of maturity and accuracy over 
the last decade. Continuing this trend, future observations promise measurements of the statistics 
of the cosmic mass distribution at an accuracy level of one percent out to spatial scales with k ~ 
10/iMpc~^ and even smaller, entering highly nonlinear regimes of gravitational instability. In order 
to interpret these observations and extract useful cosmological information from them, such as 
the equation of state of dark energy, very costly high precision, multi-physics simulations must be 
performed. We have recently implemented a new statistical framework with the aim of obtaining 
accurate parameter constraints from combining observations with a limited number of simulations. 
The key idea is the replacement of the full simulator by a fast emulator with controlled error bounds. 
In this paper, we provide a detailed description of the methodology and extend the framework to 
include joint analysis of cosmic microwave background and large scale structure measurements. Our 
framework is especially well-suited for upcoming large scale structure probes of dark energy such as 
baryon acoustic oscillations and, especially, weak lensing, where percent level accuracy on nonlinear 
scales is needed. 

PACS numbers: 98.80.-k, 02.50.-r 



I. INTRODUCTION 

Over the last three decades observational cosmol- 
ogy has made extraordinary progress in determining 
the make-up of the Universe and its expansion history. 
The first precision observations were obtained from the 
cosmic microwave background (CMB), beginning with 
the all-sky temperature anisotropy measurements from 
COBE [T] which provided an encouraging confirmation 
of current theories of the early Universe and the forma- 
tion of large scale structure. Follow-up measurements 
from the ground [1, S 0] j balloons (9| , and space 
have resulted in constraints on the main cosmological pa- 
rameters at better than the 10% percent level of accuracy, 
and the Planck satellite mission [8| promises even further 
improvement. But CMB measurements are not the only 
observational source for precision cosmology. Structure 
formation probes such as large-scale surveys of the distri- 
bution of galaxies, and weak lensing and cluster surveys, 
are reaching similar levels of statistical and systematic 
control, as are supernova observations. These newer tech- 
niques, exemplified by surveys such as the Sloan Digital 
Sky Survey (SDSS) '9] yield complementary data to the 
CMB to help determine the large-scale description of the 
Universe [lOj . 

Precision measurements from several different cosmo- 
logical probes have revealed a highly unexpected result: 
roughly 70% of the Universe is made up of a mysteri- 
ous dark energy which is responsible for a recent epoch 
of accelerated expansion. Understanding the nature of 
dark energy is the foremost challenge in cosmology to- 
day. Ground-based telescopes and satellite missions have 



been proposed or are under development to measure the 
equation of state parameter of dark energy w = p/p (p is 
the pressure and p the density) at the one percent level, 
and its time derivative to 10%. Cosmic structure-based 
methods to understand the nature of dark energy include: 
baryon acoustic oscillations as probed by the large-scale 
distribution of galaxies [ll| , weak lensing measurements 
of the dark matter distribution [l^, and measurements 
of the abundance of clusters of galaxies [13|. All three 
probes require the understanding of nonlinear physics at 
different length scales. At small scales, in addition to 
gravity, baryonic physics plays an important role, signif- 
icantly complicating the modeling task. 

As cosmological observations continue to improve, in- 
creasing demands are placed on the underlying theory. 
Since cosmology is an observational science, the role of 
theory in interpreting observations is crucial to the suc- 
cess of the entire enterprise. Therefore, in order to inter- 
pret and optimally design future observations, theoretical 
predictions have to be at least as accurate - preferably 
more accurate - than the observations. In different are- 
nas of cosmology, however, the individual levels of theo- 
retical control are far from uniform. 

The growth and formation of large scale structure in 
the Universe results from the action of the gravitational 
instability on primordial fluctuations. Currently, by far 
the most favored scenario for generating these fluctua- 
tions is perturbations from inflation [l3| , and theoretical 
predictions for most simple inflationary models can be 
computed rather precisely p^ . certainly better than the 
level of accuracy set by near-future CMB observations. 
A key theoretical task lies in connecting the primordial 
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fluctuations to present-day observations. 

Of all cosmological probes, our understanding of the 
physics of the CMB is by far the most advanced. Because 
linear theory is applicable in this case, observables such as 
the temperature power spectrum can be determined with 
high accuracy, the linearized Einstein equations and rele- 
vant Boltzmann equations being treated more or less fully 
as, e.g., in Ref. ^16j . This approach is rather expensive, 
however, and the development of more efficient meth- 
ods has been an important research activity in the last 
decade. The most popular approach is the line-of-sight 
integration method, the underlying algorithm in codes 
like CMBFAST [l7| or CAMB ^ which are the major 
resources used for CMB analysis today. A single run with 
such a code takes only a few seconds. Since cosmological 
parameter estimation can often involve tens of thousands 
of Markov Chain Monte Carlo (MCMC) simulation runs, 
several approaches have been developed to replace even 
these codes by some type of "look-up" technique. These 
methods include purely analytic fits fl^ . [20j . combina- 
tions of analytic and semi-analytic fits ■21'|, and interpola- 
tion schemes based on a large set of training runs ^22,, ■ 
Most of the approximations are accurate at the 5% level 
over their range of validity, some being accurate at the 
sub-percent level over a limited range of parametric vari- 
ation. The accuracy of all these schemes deteriorates very 
rapidly, however, if the parameter ranges under consid- 
eration are expanded. Also, in general it is non-trivial to 
extend these schemes to include additional parameters or 
different data sets. We will return to some of these issues 
in more detail in Appendix |X] where we compare these 
methods to those explained in this paper. 

For large scale structure probes of cosmology the situa- 
tion is very different. The treatment of nonlinear physics 
can (mostly) no longer be avoided, and depending on the 
scales of interest, baryonic physics has to be treated ac- 
curately as well. In order to predict the matter power 
spectrum or the halo mass function in the regimes of in- 
terest, large, costly A^-body codes have to be resorted to. 
Fits to the matter power spectrum such as those given 
in Refs. [H, [2^ are accurate at the 10% level, an order 
of magnitude shy of that eventually required. 

In the case of baryon acoustic oscillations, the relevant 
length scales of interest are around 100/i~^Mpc, and it is 
sufficient to carry out dark matter-only simulations (the 
understanding of systematics with regard to galaxy prop- 
erties may require incorporation of extra physics). For 
upcoming, near-future weak lensing measurements the 
scales of interest are around 10/i~^Mpc, once again, dark 
matter only simulations being able to faithfully capture 
all the physics relevant on those scales. Future ground- 
based telescopes [1^ and space missions, such as the 
Joint Dark Energy Mission 28] , will push the weak lens- 
ing scales beyond l/i~^Mpc. At these scales baryonic 
physics affects the matter power spectrum at the 10% 
level (2^ |30| and must be included in the simulations. 
Clusters of galaxies probe even smaller scales - in this 
case, an accurate treatment of gas physics and astrophys- 



ical feedback mechanisms is absolutely essential |13| . 

All of these simulation tasks are major undertakings, 
even for the "easiest" cases, where pure dark matter sim- 
ulations are sufficient, i.e., baryon acoustic oscillations 
and weak lensing on scales > lOft-^^Mpc. A typical high- 
accuracy simulation for such studies would be in the bil- 
lion particle, Gigaparsec cubed volume class. Such a sim- 
ulation carried out with the treecode HOT [3l[, one of 
the world's most efficient A^-body codes, requires roughly 
30,000 Cpu-hours (compared to a few seconds of a CAMB 
run for CMB predictions) on current hardware. It is, 
therefore, immediately obvious that - at least in the fore- 
seeable future - a brute force approach to running a large 
number of cosmological models with an A^-body code will 
not be feasible. One can envision running hundreds of 
state-of-the-art simulations, but a number like tens of 
thousands would be well out of reach. 

It is therefore important to investigate how to extract 
robust predictions for untried cosmological (and model- 
ing) parameter settings based on a relatively small num- 
ber of very accurate simulations. A successful framework 
for achieving these aims should: 

1. require only a tractable number of costly simula- 
tions to create an accurate emulator which can re- 
place the full simulator, 

2. provide an optimal sampling strategy for the simu- 
lation runs to obtain the best possible performance 
of the emulator, 

3. integrate the uncertainties from the emulator pre- 
dictions into the parameter constraints, 

4. be easy to extend to include diverse data sets, 

5. be capable of handling a large set of cosmological 
parameters without catastrophically increasing the 
computational overhead. 

We have recently introduced such a statistical frame- 
work (3^ in order to determine cosmological and model 
parameters and associated uncertainties from simulations 
and observational data (for an overview of the basic ideas 
see, e.g., Refs. [11] and [11]). The framework integrates 
a set of interlocking procedures: (i) simulation design - 
the determination of the parameter settings at which to 
carry out the simulations; (ii) emulation - given simu- 
lation output at the input parameter settings, how to 
estimate the output at new, untried settings; (iii) uncer- 
tainty and sensitivity analysis - determining the varia- 
tions in simulation output due to uncertainty or changes 
in the input parameters; (iv) calibration - combining ob- 
servations (with known errors) and simulations to esti- 
mate parameter values consistent with the observations, 
including the associated uncertainty. The last step en- 
ables predictions of new cosmological results with a set 
of uncertainty bounds. 

Our initial results are very promising. We find that 
only a relatively small number of (sufficiently accurate) 
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simulations appear to be required in order to observa- 
tionally constrain several cosmological parameters at the 
few percent level. Partly, this is due to the fact that 
in cosmology the response surface being modeled by the 
emulator is very smooth, and partly due to the rela- 
tively narrow range of variation for the prior values of 
some parameters. Our choices of Gaussian process (GP) 
modeling for the emulator and the associated stratified 
sampling procedure, described in Section III CI below, are 
known to function particularly well under these circum- 
stances. 

In this paper we discuss the methodology behind the 
framework in some detail. For concreteness we focus on 
a simple example application: Estimation of five cosmo- 
logical parameters from dark matter structure formation 
simulations and a synthetic set of "WMAP -f SDSS" 
measurements of the matter power spectrum. The sta- 
tistical framework is introduced in Section |TT1 along with 
an explanation of how the synthetic datasets were con- 
structed, the results of various tests, as well as the final 
results from the estimated posterior distribution of the 
five cosmological parameters. The framework is extended 
in Section mil to combine disparate measurements, in the 
present case, separate measurements of the matter power 
spectrum and of the CMB temperature anisotropy. Con- 
clusions and future directions are presented and discussed 
in Section llVI In Appendix \X\ we compare our approach 
for fast calculation of the CMB temperature anisotropy 
with other interpolation schemes. 



II. THE STATISTICAL FRAMEWORK 

In this section we describe our statistical methodology 
aimed at confronting physical observations with output 
from a simulation model in order to best infer unknown 
model parameters. As a specific application example, we 
consider a single set of synthetic observations y of the 
mass power spectrum (Fig. |T]) along with a finite sample 
set of mass power spectra derived from A^-body simula- 
tions run with different choices of cosmological parame- 
ters. 

The simulation model requires pe-vectov 9* of input 
parameter settings (in our case cosmological parameters) 
in order to produce a mass power spectrum ri[k;9*) {k 
being the wavenumber). The simplest possible assump- 
tion is to postulate that the vector of observations y is 
a noisy version of the simulated spectrum ri{k; 9) at the 
true setting 9: 



where L{y\rj{k]9)) comes from the normal sampling 
model for the data 



y = ri{k;9) + e, 



(1) 



where the error vector e is normal, with zero mean and 
variance Sy. Given a prior distribution n{9) for the true 
parameter vector 9, the resulting posterior distribution 
TT{9\y) for 9 is given by 



L{y\rj{k; 9)) = exp i^--{y - r,{k; 9)f^-\y - r;(fc; 9)) 

(3) 

In principle, this posterior distribution could be explored 
via Markov Chain Monte Carlo (MCMC) as has become 
a standard practice in cosmological data analysis. How- 
ever, if a single evaluation of rj{k; 9) requires hours (or 
days) of computation, a direct MCMC-based approach is 
infeasible. 

Our approach deals with this computational bot- 
tleneck by treating ri{-) as an unknown function to 
be estimated from a fixed collection of simulations 
r]{k; 91), . . . , r]{k; 9^). This approach requires a prior dis- 
tribution for the unknown function r](-), and treats the 
simulation output 77* — {ri{k; 91), ... , rj(k\ 9^))'^ as addi- 
tional data to be conditioned on for the analysis. Hence 
there is an additional component of the likelihood ob- 
tained from the sampling model for 77* given the under- 
lying function ri(-) which we denote by L(?7*|77(-)). 

For this case, the resulting posterior distribution has 
the general form 

7r(9,r^{-)\y,r^*) cx L{y\7j{9)) ■ L{f^*\7j{-)) -nirji-)) -^{9), (4) 

which has traded direct evaluations of the simulator 
model for a more complicated form which depends 
strongly on the prior model for the function ri{-). Note 
that the marginal distribution for 9 will be affected by 
uncertainty regarding ??(■). 

In the following subsections, we describe in detail a 
particular formulation of Eqn. Q in the context of the 
synthetic mass power spectrum application which was 
earlier used in Ref. \3^. This formulation has proven 



7r{9\y)^Liy\rj{k;9))-TT{9), 



(2) 



fruitful in a variety of physics and engineering applica- 
tions which combine field observations with detailed sim- 
ulation models for inference. In particular we cover ap- 
proaches for choosing the m parameter settings at which 
to run the simulation model, and a (prior) model - or em- 
ulator - which describes how ?/(•) is modeled at untried 
parameter settings. Section III Dl describes how the ob- 
served data is combined with the simulations and the em- 
ulator to yield the posterior distribution. In the following 
section we will demonstrate how this formulation can be 
extended to combine information from different data sets 
from galaxy surveys and cosmic microwave background 
measurements. 



A. The Synthetic Power Spectrum 

A synthetic data set has the key advantage that the 
underlying cosmological parameters are known a priori. 
This allows direct testing of any proposed statistical pro- 
cedure for estimating these parameters. In order to gen- 
erate the synthetic power spectrum we run ten realiza- 
tions of a given cosmology in a large cosmological vol- 
ume with the particle mesh (PM) code MC'^ (for more 
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information and comparison results against other codes, 
see Ref. i35|]). The matter transfer function used to set 
the initial conditions is generated using CMBFAST ■ 
Averaging over the ten realizations (each of which cov- 
ers a simulation volume of 450/i^^Mpc cubed) produces 
a smooth power spectrum suppressing uncertainties due 
to cosmic variance. 

We consider five input parameters for the power spec- 
trum in a ACDM cosmology, the spectral index, n, the 
Hubble parameter in units of 100 km/s/Mpc, h, the nor- 
malization of the amplitude as specified by ag, and the 
dark matter and baryonic contributions to the matter 
density specified as fractions of the critical density, r2cDM 
and flh, respectively 



9 = (n, /l, cr8,SlcDM,^^b)- 



(5) 



The further assumption of spatial flatness, fitot = 1, then 
uniquely fixes the contribution of the cosmological con- 
stant, r^A- The five input parameters chosen for generat- 
ing the synthetic power spectrum are 



e ^ (0.99, 0.71, 0.84, 0.27, 0.044). 



(6) 



We match the nonlinear power spectrum at fc = 
0.1/iMpc~^ to the linear power spectrum in order to 
increase the fc-range down to very large length scales, 
k = O.OOl/iMpc"^. Next, we choose 28 points from this 
combined power spectrum spaced roughly in the same 
bins as a typical matter power spectrum extracted from 
combined CMB and large-scale structure observations, 
more specifically, those for WMAP [7| in the low k- 
range transitioning to values typical for SDSS data (36| 
in the large fc-range. Finally, the points are moved off the 
base power spectrum according to a Gaussian distribu- 
tion with 1-cr confidence. The resulting "measurement" 
and the smooth input power spectra are shown in Fig. [TJ 
These synthetic observations will be the underlying data 
set for verifying and testing the statistical analysis frame- 
work which we now discuss in detail. 



B. The Simulation Design 

Ongoing and near-future observations set stringent re- 
quirements on the accuracy of theoretical predictions for 
observables such as the power spectrum and the halo 
mass function. It will soon be insufficient to use ana- 
lytic fits for the power spectrum (see, e.g. , Refsj2jj2£ 
or Press-Schechter like fits (e.g., Refs. [ssl. Isol. |40|. l4lL |42 
l43j l for the mass function. Fully nonlinear treatments 
based on simulations will be needed, especially if the aim 
is to get reliable results on small scales {k > 0.2/i~^Mpc). 
As discussed earlier, such simulations can be very costly, 
especially if they are not restricted only to dark matter, 
but also include gas physics. In addition, the dimension- 
ality of the parameter space to be explored is large, with 
possibly of order twenty parameters to be considered. 
This combination of a limited number of simulation runs 
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FIG. 1: Twenty-eight data points mimicking a combined data 
set from a large scale structure survey and CMB data. The 
error bars are chosen to match the power spectrum calcula- 
tion in Ref. [s^]. The dashed line shows the original input 
power spectrum from ten realizations, while the gray lines 
show a subset of the 128 simulated power spectra, discussed 
in Section HTB] 



and a rather large number of cosmological and modeling 
parameters to be constrained demands a thoughtful de- 
sign strategy for deciding on the parameter settings at 
which to run the simulations. 

The simulation design refers to a sequence of simula- 
tion runs carried out at m input settings which vary over 
predefined ranges for each of the pg input parameters: 



\9n 



\9; 



(7) 



We use 9* to differentiate the design input settings from 
the true value of the parameter vector 9 which is to be 
estimated. 

The design of computer experiments, as simulations 
are often referred to in the statistical literature, has re- 
ceived considerable attention recently in the statistics 
community, spurred on largely by the increasing use of 
complex simulation models to augment understanding 
gained from experiments or observations (see Ref. [i^, 
Chs. 5-6, for a recent survey of the area). The goal in 
our application is to use a sequence of simulation runs to 
build a GP-based emulator for the expensive simulation 
code with the aim of predicting code output at untried 
parameter input settings. A GP model typically interpo- 
lates the output of the "training" simulations obtained 
from the experimental design, and gives predictions that 
vary smoothly with changes in the input parameters. Be- 
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cause the GP model exploits the smoothness in the sim- 
ulation response (as a function of the input parameters), 
space-filling Latin hypercube (LH) designs have proven 
to be well suited to the purpose of building GP-based em- 
ulators m, 111]. In particular, we have used orthogonal 
array (OA)-based LH designs [47| as well as symmetric 
LH designs f48l|. Fig. [5] shows the m = 128 point design 
over the pe = 5 dimensions used in this analysis. This de- 
sign was constructed by perturbing a 5-level orthogonal 
array design so that each 1-dimensional projection gives 
an equally spaced set of points along the standardized 
parameter range [0,1]. 

The actual parameter ranges used for the m = 128 
simulations are 

0.8 < n < 1.4, 

0.5 < h < 1.1, 

0.6 < CTs < 1-6, 
0.05 < QcDM < 0.6, 
0.02 < f^b < 0.12. (8) 

These ranges are standardized to [0, 1]^ by shifting and 
scaling each interval. (The resulting simulations are pro- 
duced by joining spectra obtained from GMBFAST and 
MG^ as described in Section Hi Al and as shown in FiglTJ) 
Such space filling LH designs are well-suited to the 
strengths of a GP model which can be fit to an arbi- 



trary set of design points. In contrast, more standard 
interpolation schemes (e.g., Ref. [i^, Gh. 3.6) typically 
require a grid-based design for interpolation. Grid-based 
designs are very inefficient since expanding even a sparse 
grid over a pg-dimensional space will be computation- 
ally prohibitive. For our example, where po — 5, even 
a [0, i, 1]^ grid requires 243 simulations. The grid-based 
approach, while simpler, also gives very poor coverage 
of low-dimensional projections. In our example, most of 
the simulator activity is explained by two parameters, trg 
and ilcDM, for which the grid-based design assigns only 
9 unique values. 



C. Emulating Simulator Output 

Our analysis requires the development of a probabil- 
ity model to describe the simulator output at untried 
settings 9. To do this, we use the simulator outputs to 
construct a GP model that "emulates" the simulator at 
arbitrary input settings over the (standardized) design 
space [0, 1]^" . To construct this emulator, we model the 
simulation output using a p,j-dimensional basis represen- 
tation: 

Pv 

Vik; 0)^Y1 Mk)wdO) + e., 9 e [0, 1]''*, (9) 
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where {(l>i{k), . . . ,(/)p^(fc)} is a collection of orthogonal, 
n,,-dimensional basis vectors, the Wi{9)^s are GPs over 
the input space, and e is an n^-dimensional error term. 
This type of formulation reduces the problem of build- 
ing an emulator that maps [0, 1]^" to i?"'' to building 
independent, univariate GP models for each Wi{9). The 
details of this model specification are given below. 

Output from each of the m simulation runs prescribed 
by the design results in n^-dimensional vectors, which we 
denote by rji, . . . Trjm- Since the simulations rarely give 
incomplete output, the simulation output can often be 
efficiently represented via principal components [1^ . We 
first standardize the simulations by centering about the 
mean of raw simulation output vectors: ^ Sjli '^i- 
then scale the output by a single value so that its vari- 
ance is 1. This standardization simplifies some of the 
prior specifications in our models. We also note that, 
depending on the application, an alternative standard- 
ization may be preferred. Whatever the choice of the 
standardization, the same standardization is also applied 
to the experimental data. 

We define z/sims to be the x m matrix obtained by 
column-binding the (standardized) output vectors from 
the simulations 



FIG. 2: Lower triangle of plots: 2-dimensional projections 
of a m = 128 point, 5-level, OA design. Upper triangle: An 
OA-based LH design obtained by spreading out the 5 level OA 
design so that each 1-dimensional projection gives an equally 
spaced set of points along [0,1]. 



ysims = [m; - ■ ■ ;vm]- (lO) 

Typically, the size of a given simulation output n,, is 
much larger than the number of simulations carried out, 
TO. We apply the singular value decomposition (SVD) to 
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the simulation output matrix ysims giving 
ysims = UDV^, 



(11) 



where [/ is a n,, x m orthogonal matrix, _D is a diago- 
nal m X m matrix holding the singular values, and V 
is a TO X TO orthonormal matrix. To construct a p,,- 
dimensional representation of the simulation output, we 
define the principal component (PC) basis matrix <i>^ to 
be the first columns of [-^UD]. The resulting prin- 
cipal component loadings or weights are then given by 
[v^y], whose columns have variance 1. 

For representing the mass power spectrum we found 
that it is adequate to take p,, = 5 so that = 
[4'ij 4'2', (I>3j 4>4:j 4'5]j the basis functions ^i, ^2, 4'3j 4>i f-i^d 
05 are shown in Fig. [31 Note that the 0i's are functions 




Wi{9') when the input conditions 9 and 9' are identical, 
except for a difference of 0.5 in the Ith component. Note 
that this interpretation makes use of the standardization 
of the input space to [0, 1]^". 

Restricting to the to input design settings given in 
Eqn. ([7]), we define the to- vector Wi to be the restriction 
of the process u'i(-) to the input design settings 



Wi ^ {wi{9l), . . . ,Wi{9*^))'^ , i = 1, 



(14) 



In addition we define R(9*; p^i) to be the m x m corre- 
lation matrix resulting from applying Eqn. (jl3p to each 
pair of input settings in the design. The pe-vector p^i 
gives the correlation distances for each of the input di- 
mensions. 

At the m simulation input settings, the mp^-vector 



Wi 



T ^T 
Vv' 



N 



then has prior distribution 
/0\ (K, \\ 

'•• 



(15) 



-3.0 -2.0 -1.0 -3.0 -2.0 -1.0 -3.0 -2.0 -1.0 

log wai/anumber log(k) log wai/anumber log(l() log wai/anumber log(k) 

FIG. 3: Simulations (left), mean (center), and the first five 
principal component bases (right) derived from the simulation 
output for the power spectrum. 

of the logarithm of the wave number, k. 

We use the basis representation of Eqn. ([9]) to model 
the n^-dimensional simulator output over the input 
space. Each PC weight Wi{9), i — 1, . . . ,p^, is then mod- 
eled as a mean-zero CP 

w;.(0)^GP(O,A-ii?(0,0';p™)), (12) 

where X^i is the marginal precision of the process and 
the correlation function is given by 

Pe , 2 

RiO,9';p^.) = l[pZ''''^ ■ (13) 

This is the Gaussian covariance function, which gives 
very smooth realizations, and has been used previously in 
Refs. (ssl. [Sll to model simulation output. An advantage 
of this product form is that only a single additional pa- 
rameter is required per additional input dimension, while 
the fitted CP response still allows for rather general in- 
teractions between inputs. We use this Gaussian form for 
the covariance function because the simulators we work 
with tend to respond very smoothly to changes in the in- 
puts. Depending on the nature of the sensitivity of simu- 
lation output to input changes, one may wish to alter this 
covariance specification to allow for rougher realizations. 
The parameter p^u controls the spatial range for the Zth 
input dimension of the process Wi (9) . Under this param- 
eterization, pwii gives the correlation between Wi{9) and 



= -R(^'* ; Pwi ) , 

which is controlled by precision parameters held in A^j 
and pr, ■ Pe spatial correlation parameters held in p^ . 

The centering of the simulation output makes the 
choice of zero mean prior appropriate. The prior above 
can be written more compactly as 

u;~7V(0,S^), 

where S^, controlled by parameter vectors and Pw, 
is given in Eqn. 

We specify independent F(am,6u,) priors for each X^i 
and independent beta(ap^ , 6p^) priors for the PwUS'- 

7r(A^j) oc A^^^^e"*"™^™', i = l,...,p^, 

T^iPwil) oc p^^.y^^l - Ptuji )''""' i = 1, . . . ,_p,,, 

l^l,... ,pg. 

We expect the marginal variance for each Wi{-) process 
to be close to unity due to the standardization of the 
simulator output. For this reason we specify that ayj — 
bw — 5. Thus Xwi has a prior mean of 1, and a prior 
standard deviation of 0.45. In addition, this informative 
prior helps stabilize the resulting posterior distribution 
for the correlation parameters which can be traded off 
with the marginal precision parameter [53 |. 

Because we expect only a subset of the inputs to in- 
fluence the simulator response, our prior for the correla- 
tion parameters reflects this expectation of "effect spar- 
sity" . Under the parameterization in Eqn. (|13p . input 
I is inactive for PC i if p^u — 1. Choosing = 1 
and < bp^ < 1 will give a density with substantial 
prior mass near 1. We take bp^ = 0.1, which makes 
P^iPwii < 0.98) « 1/3 a priori. In general, the selection 
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of these hyperparameters should depend on how many of 
the pe inputs are expected to be active. 

If we take the error vector in the basis representa- 
tion of Eqn. ^ to be i.i.d. (independent and identically 
distributed) normal, we can then develop the sampling 
model, or likelihood, for the simulator output. We de- 
fine the n^m- vector rj to be the concatenation of all m 
simulation output vectors 

77 = vec(ysims) = yec{[ri{6l); • • • ; rjie*^)]). (16) 

Given precision A,, of the errors, the likelihood is then 

L{ri\w, \n) oc \~ exp {-iA^(7; - <^>w)'^ {-q - $w)} , 

(17) 

where the x mp^ matrix <i> is given by 

$ = ;/™®0pJ, (18) 

and the 's are the Pr, basis vectors previously computed 
via SVD. A r(a,,,6,,) prior is specified for the error pre- 
cision A^. This parameter controls the size of the errors 
between the actual simulations and the basis representa- 
tion of the simulations. We expect the data to be very in- 
formative about A^, so we choose = 1 and 6^ = .0001, 
which gives little prior information regarding A^. 
Since the likelihood factors are as shown below 

L(?7|w, A,,) oc 

\ ^ exp {-iA^(w - i())^($^<i>)(w - w)] X 

A,r^^exp{-iA„r;^(/-$($^$)-i$^)?7} , 

the formulation can be equivalently represented with a 
dimension-reduced likelihood and a modified r(a'j,6Jj) 
prior for A^: 

L{w\w, Xri) OC 

exp {-iA^(w - w)^($'^$)(i& - w)} , 

(19) 

where 

, , ^{nrj -Pn) 
% = «') + 2 ' 

^'n = ^ + i??^(/-«'($'^«')"^*^)?7, and 

w = ($'^$)"i$'^?7. (20) 

Thus, the normal likelihood for 77 with the gamma prior 
for A,, 

77|W, A,, N{(^W,X^'^Inr,)i A,, T{ar,,br,) 

is mathematically equivalent to a normal model for w 
with an altered gamma-prior model for A,, 

w\w, \ ~ N{w, (A^$^$)-i), A„ - r(a;, h'^) 



since 

L{ri\w, A,,) X 7r(A^; a,,, 6,,) oc L{w\w, A,,) x 7r(A,,; a^, b'^). 

(21) 

The likelihood depends on the simulations only 
through the computed PC weights w. After integrating 
out w, the posterior distribution becomes 

7r(A,,, Auj, p^|u)) oc 

|(A^$^$)-i-KS^r^ X (22) 
exp{-li(;^([A„$^$]~i + Y^^y^w} x (23) 

P17 Pv Pe 

aJ'^^-";.^" n A^r^^-""'- n 11(1 - p^,i^^-\ 

1=1 i—1 j — 1 

This posterior distribution is a milepost on the way to 
the complete formulation, which also incorporates exper- 
imental data. However, it is well worth further consider- 
ing this intermediate posterior distribution for the sim- 
ulator response. It can be explored via MCMC using 
standard Metropolis updates and we can view a number 
of posterior quantities to illuminate features of the simu- 
lator. For example, Fig.[4]shows boxplots of the posterior 
distributions for the components of pw From this figure 
it is apparent that PCs 1 and 2 are most infiuenced by 
as and ficDM- 
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FIG. 4: Boxplots of posterior samples for each p^u for the 
mass power spectrum. 

Given the posterior realizations from Eqn. ((22|) . one 
can generate realizations from the process rj{9) at any 
input setting 0*. Since 

Pv 

7?(fc;r)=^</),(fc)u;,(0*), (24) 

i=l 

realizations from the Wi{9*) processes need to be drawn 
given the MCMC output. For a given draw (A^, A^,, p^) 
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a draw of w* = {wi {9*), . . . , Wp^ {d*)y can be produced 
by making use of the fact 



N 



^w^w* 



(25) 



where E^,^tu* is obtained by applying the prior covari- 
ance rule to the augmented input settings that include 
the original design and the new input setting (6**). [Re- 
call that w is defined in Eqn. (PO)) .] Application of the 
conditional normal rules then gives 



w*\w ~ NiV2iV-,^w, V22 - V21V-W12), (26) 



where 



V 



Vn V12 
V21 ^22 

(A^$^$)-i 




^ i^^w 1 Pw ) 



(27) 



is a function of the parameters produced by the 
MCMC output. Hence, for each posterior realization of 
{X,i , Xw , Pw) , a realization of w* can be produced. The 
above recipe easily generalizes to give predictions over 
many input settings at once. 

Fig.[5]shows posterior means for the simulator response 
r] where each of the inputs is varied over its prior range of 
[0, 1] while the other four inputs are held at their nominal 
setting of 0.5. The posterior mean response conveys an 




3 -2 -1 



FIG. 5: Changes to the posterior mean simulator predictions 
obtained by varying one input, while holding others at their 
central values, i.e. at the midpoint of their range. The light 
to dark lines correspond to the smallest parameter setting to 
the biggest, for each parameter. 

idea of how the different parameters affect the highly mul- 
tivariate simulation output. Other marginal functionals 
of the simulation response can also be calculated such as 
sensitivityindicics or estimates of the Sobol decomposi- 
tion [5il.l53|. 

Note that a simplified emulator can be constructed by 
taking point estimates for (A^, A^,, puj) (posterior mean, 
or posterior medians) and then defining the emulator to 
be the mean in Eqn. 



1. Emulator Test and Convergence 

How do we know that the constructed emulator is ac- 
curate and how many model simulations are needed to 
obtain the desired emulation accuracy? The answers to 
these questions depend very much on the smoothness of 
the function we wish to emulate. If the function is smooth 
and almost featureless, one anticipates that only a small 
number of simulations should suffice to yield an acccept- 
able emulator. The presence of features and noise in the 
function to be emulated will clearly require many more 
simulation outputs; it may well be that beyond some 
point the optimal emulation strategy is no longer based 
on the GP model. The number of simulations required to 
build an accurate emulator depends strongly on the num- 
ber of active parameters, i.e., parameters that change the 
output considerably when varied. 

To test the accuracy of a proposed emulator, so-called 
hold-out tests are very useful. The basic idea is simple: 
A small subset of of the simulations is set aside and the 
emulator built on the remaining simulations. Then the 
new emulator is evaluated at the parameter settings of 
the held-out simulations. By comparing the emulator 
and simulation results, the accuracy of the emulator can 
be estimated. An example of this approach is provided 
in Ref. [s^l where we perform hold-out tests by build- 
ing the emulator on a subset of 125 out of the total of 
128 simulations. Then we test the the emulator on the 
remaining three simulations by running it at the exact 
parameter settings of the three holdouts. In this way we 
test (in turn) the emulator on all 128 simulations. 

In the current paper, we present a slightly modified 
strategy. In addition to the 128 run design, we also em- 
ploy an independent 64 run design as a reference to in- 
vestigate the accuracy of the emulator. The present ap- 
proach has the advantage that the emulator under test 
is built on the full set of simulations. This might not be 
too important if the design is already large, but if the de- 
sign is restricted to a small number of runs, every run is 
important in determining the accuracy of the emulator, 
especially near the boundaries of the sampling domain. 

In the left panel of Fig. [5] we show the results for the 
predictions of the 128 run emulator on the additional 64 
runs. Overall, the emulator performance is very satisfac- 
tory. We display the residuals of the emulator prediction 
compared to the simulation runs on the native scale. The 
dark gray band contains the middle 50% of the residuals, 
the light gray band the middle 90%. The overall accu- 
racy of our emulator over a wide range in wave number 
is better than ~ 5%. Only on the edges of the parame- 
ter ranges investigated is the fidelity slightly worse. Note 
that the emulation accuracy is strongly dependent on the 
size of the allowed parameter range, and improves signif- 
icantly as this range is shrunk. We will come back to this 
point later below. 

Next we investigate how the accuracy of the emulator 
depends on the number of the underlying simulations. 
We create a design for 32 simulations using an OA-LH 
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FIG. 6: Emulator predictions for a 64 run design. Left: em- 
ulator based on 128 runs, right: emulator based on 32 runs. 
The central gray region contains the middle 50% of the residu- 
als, the wider light gray region, the middle 90%. The outliers 
are shown as dots. The improvement of the emulator accuracy 
with more training runs is evident, especially in the medium 
fc-range. 



sampling design and test it on the same reference 64 de- 
sign run we already used for testing the 128 run emulator. 
In Fig. [SI right panel, we show the same statistics as for 
the 128 run emulator. The overall quality of the emulator 
is still very good, at the 10% level. Compared with the 
larger-design emulator, however, the predictions in the 
medium k range show a higher level of deviation from 
the reference values. 



D. Full Statistical Formulation 



number is quite dense, this interpolation is straightfor- 
ward. 

We specify a r(aj,, by) prior for the precision parameter 
Aj, resulting in a normal-gamma form for the data model 

y\w{9), Xy ^ N{%w{e), (XyWy)-'), A, ^ r(a„ by). 

(28) 

The observation precision Wy is often fairly well-known 
in practice. Hence one may choose to fix A^ at 1, or 
use an informative prior that encourages its value to be 
near one. In the mass power spectrum example we fix 
Ay at 1, though we continue to use this parameter in the 
formulation below. 

Equivalently, Eqn. ([^5]) can be represented in terms of 
the basis weights 



Wy\w{0),Xy 



with 



N{W{9), {Xy<i>lWy<fy)-'), Xy ^ T (fl^ , b'y) , 

(29) 



{^lWy'i>y)-'^lWyy, 



= Uy + i(n — p,,), and 



by + hiV- ^vWy)'^Wv{y - ^yWy)- 



This equivalency follows from Eqn. (PTjl given in Sec. Ill Cl 

The (marginal) distribution for the combined, reduced 
data obtained from the experiments and simulations 
given the covariance parameters, has the form 



N 







A-1 



-'WyW 



where Yi^ is defined in ([15]), 



(30) 



Given the model specifications for the simulator rj{6), 
we can now consider the sampling model for the exper- 
imentally observed data. The data are contained in an 
n- vector y. For the synthetic mass power spectrum appli- 
cation, n = 28, corresponding to the different wave num- 
bers as shown in Fig. [TJ As stated in Section [Til the data 
are modeled as a noisy version of the simulated spectrum 
ri[k; 6) run at the true, but unknown, parameter setting 
^1 y = vi^j ^) + where the errors are assumed to be 
N^Oj'Ey). For notational convenience we represent 'Sy^ 
as XyWy, leaving open the option to estimate a scaling 
of the error covariance with A^. Using the basis repre- 
sentation for the simulator this becomes 



y = <^yw{e)+e, 
where w{6) is the p,,-vector (wi(9),. 



(0))' 



Be- 
cause the wave number support of y is not necessarily 
contained in the support of the simulation output, the 
basis vectors in $y may have to be interpolated over wave 
number from $„. Since the simulation output over wave 



Ay = Xy^'^Wy%, 



'-Pv 



Pri X identity matrix, 

/x-lR{e,e*;p.^i) 



V 



^wl^R{(^^(^*\Pwp,X 



Above, R{6,6*\ puji) denotes the 1 x m correlation sub- 
matrix for the GP modeling the simulator output ob- 
tained by applying Eqn. to the observational setting 
9 crossed with the m simulator input settings . . . , 0^. 



1. Posterior distribution 

If we take z to denote the reduced data {wy ,w'^)'^ , 
and to be the covariance matrix given in Eqn. (I30p . 
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FIG. 7: Estimated posterior distribution of the parameters 
9 = (n, /i, erg, ilcDM, f^b)- The diagonal shows the estimated 
marginal posterior pdf for each parameter; the off-diagonal 
images give estimates of bivariate marginals; the contour lines 
show estimated 90% hpd regions. The true values from which 
the data were generated are shown by the red dots. 



the posterior distribution has the form 

7r(A^, A„,, p,„, \y,9\z) cx 

|S5|-5exp{-iz^Sriz} X 

, Pv 



(31) 



Pv Pa 



i=l 
~1 , 



i=l 1 = 1 

where C denotes the constraint region for 0, which is typ- 
ically a pe-dimensional rectangle. In other applications C 
can also incorporate constraints between the components 
of 0. 

Realizations from the posterior distribution are pro- 
duced using standard, single site MCMC. Metropohs up- 
dates [54*] are used for the components of pw and 9 with 
a uniform proposal distribution centered at the current 
value of the parameter. The precision parameters A,,, A^ 
and Aj^ are sampled using Hastings updates [5^. Here 
the proposals are uniform draws, centered at the current 
parameter values, with a width that is proportional to 
the current parameter value. In a given application the 
candidate proposal width can be tuned for optimal per- 
formance. 

The resulting posterior distribution estimate for is 
shown in Fig. [7] on the original scale. It nicely brackets 
the true values of 6* = (0.99,0.71,0.84,0.27,0.044) from 
which the synthetic data were generated. 



III. COMBINED CMB AND LARGE SCALE 
STRUCTURE ANALYSIS 

So far our analysis has focused on a single observational 
data set, but cosmological parameter estimation requires 
combining a number of different observational datasets 
with their (separate) associated modeling methodologies. 
We now consider a simple example of this, the joint anal- 
ysis of CMB temperature anisotropy data and the mass 
power spectrum as sampled by large-scale structure sur- 
veys. 

The power spectrum of the CMB anistropy as a func- 
tion of angular scale or multipole moment I has signifi- 
cantly more structure than the matter power spectrum 
and therefore presents a more challenging test for our 
framework. In this section we extend our synthetic data 
set to include these measurements. We restrict our anal- 
ysis to the six-dimensional "vanilla" ACDM model de- 
scribed by 



= {n,h,cTs,^cDM,^h,r). 



(32) 



Different groups analyzing different data sets, e.g. 
Refs. 0, [l^, found that the model specified by these six 
parameters consistently fits all currently available dat_a. 
Our synthetic data set mimics data from WMAP-III 
and the galaxy mass power spectrum from the SDSS [31 
For the synthetic dataset, we choose the same cosmology 
as given in Eqn. ^ and in addition the optical depth: 



T = 0.09. 



(33) 



We allow r to vary between and 0.3 in our analysis 
below. The analysis of the CMB data was carried out 
using the CAMB [l^ code throughout. We have care- 
fully checked that the transfer functions generated with 
CMBFAST for the TV-body simulations agree very accu- 
rately with the CAMB transfer functions. Our synthetic 
data set allows us to ignore real-world corrections such 
as for the normalization from the Sunyaev-Zel'dovich ef- 
fect due to hot gas in clusters. The WMAP-III analysis 
showed that such considerations can become important 
for precision measurements of cosmological parameters. 



A. Constraints from the Cosmic Microwave 
Background 

Before we carry out a combined analysis of galaxy sur- 
vey and CMB data, we investigate how well our frame- 
work does in dealing with the CMB temperature angular 
power spectrum (TT). We do not consider CMB polar- 
ization here but there is no obstacle to including it in 
future studies. 

As done earlier for the matter power spectrum we gen- 
erate a synthetic data set for the TT spectrum. For 
the error bars we assume the same magnitude as in the 
WMAP-III analysis. We first run CAMB at the parame- 
ter settings specified in Eqns. © and ([55)1 and then move 
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the data points off the base TT spectrum according to 
a Gaussian distribution with l-cr confidence level. Fol- 
lowing the matter power spectrum analysis we create a 
design for 128 runs, this time for a six-parameter space. 
In Fig. [8] we show the analog to Fig. [T] for the TT spec- 
trum: the gray lines show a subset of the 128 CAMB 
runs, the dashed line the actual input power spectrum 
and the red points the data points which will be used for 
our analysis. 

In Fig. [5] we show the analog to Fig. [HI demonstrat- 
ing that the Ci emulator predicts 90% of the holdout 
runs to better than 10% and 50% of the runs to better 
than 5%. This accuracy is impressive considering the dy- 
namic range and complexity of the C;s (and the broad 
range over which the cosmological parameters are var- 
ied). However, this complexity requires more PCs for 
accurate emulation. We have kept twelve PCs for the Ci 
analysis compared to five PCs for the matter power spec- 
trum analysis. The bivariate marginal plot summarizing 
the inference in the cosmological parameters taking into 
account the TT spectrum alone is given in Fig. (TU] (com- 
pare to Fig. [7]). 

Other groups who have developed interpolation 
schemes to predict the temperature power spectrum 12^, 
HH, m, m, [ll] choose much narrower priors than we have 
in our example. If, as is typical, we reduce the parameter 
ranges to 3-a around the best-fit WMAP data, leading 
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FIG. 8: Subset of 128 simulated TT power spectra (gray lines) 
along with the synthetic observations. The dashed line shows 
the actual spectrum from which the data were generated. The 
blue lines show the limits of the restricted parameter ranges 
from Eqns. Pijl. 



FIG. 9: Emulator performance on holdout test. The cen- 
tral gray region contains the middle 50% of the residuals, the 
wider light gray region, the middle 90%. The outliers are 
shown as blue curves. Upper plot: Residuals for the emula- 
tor built on the conservative priors given in Eqns. ([8]). Lower 
plot: Residuals for the emulator built on 3-a priors around 
the best-fit WMAP-III parameters. The emulator errors are 
reduced by an order of magnitude for the smaller parameter 
ranges. 



to: 



0.85 < 71 < 1.25, 

0.6 < h < 0.9, 

0.6 < erg < 1.2, 

0.06 < f^cDM^i^ < 0.2, 

0.018 < rihh^ < 0.034, 

0.01 < T < 0.55, 



(34) 



the dynamic range of the C;s is in turn reduced by an 
order of magnitude (blue lines in Fig. [8|). This leads 
to an improvement of the emulator quality by an order 
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of magnitude to yield results at sub-percent level accu- 
racy (Fig. [51 lower panel). We stress that our parameter 
ranges in this case are still larger than what is consid- 
ered by other groups. Their procedures are restricted to 
be valid at 3-(t around the best-fit WMAP model, cover- 
ing only a small range in parameter space. We compare 
our method with the other approaches in more detail in 
Appendix [X] 
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FIG. 10: Estimated posterior distribution following Fig.[7]for 
the synthetic CMB TT spectrum and including the optical 
depth r. 
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FIG. 11: Estimated posterior distribution following Fig. [10] 
with the addition of matter power spectrum data. 



is the influence of posterior correlations among the cos- 
mological parameters induced by the two datasets. For 
example, the degeneracy between n and r allows the in- 
clusion of matter power spectrum data (the simulations 
of which are independent of t) to significantly improve 
the estimation of r. 

It is easy to see that including additional datasets into 
this method is straightforward. 



B. Combined Constraints 

We now proceed to systematically combine information 
contained in the TT and matter power spectra. Given 
two sets of observed data, yi and j/2 that inform on a 
common set of parameters, the posterior density is 

7r(0|yi,y2)(xL(yi,y2|e)-^(e). (35) 

Assuming statistical independence of the two datasets, 
the likelihood factors into: 

7r(0|yi, y2) « L{y^\Q) ■ L{y^\B) ■ ^(^), (36) 

which is the form we have incorporated into our statisti- 
cal analysis code. 

The payoff from including both sets of observational 
data is illustrated in Figs . [TT] and [T^ and in Table [11 The 
posterior volume of the cosmological parameter space is 
significantly reduced by the addition of the matter power 
spectrum constraints. There are two main reasons for 
this volumetric reduction. First is the expected statisti- 
cal increase due to the addition of two independent and 
consistent pieces of data. Second, and more interesting, 



IV. DISCUSSION AND CONCLUSION 

We have demonstrated a powerful and general statis- 
tical methodology for performing computer simulation- 
based inference of cosmological parameters. To do this, 
we borrow techniques from a variety of statistical fields 
- including experimental design, spatial statistics and 
Kriging, and Bayesian inference - and apply them in a 
highly integrated manner to the problem of constraining 
computational models directly from the observed data. 
Several items are particularly noteworthy. First, care- 
ful simulation design prevents combinatorial explosions 
of simple grid-like designs in highly multivariate environ- 
ments. Second, the GP-based emulator design is crit- 
ical to the efficient sampling of the posterior probabil- 
ity density and allows us to evaluate global measures of 
the simulator sensitivity cheaply and accurately. Finally, 
our method isolates the separate sources of uncertainty 
in any particular inference, e.g., the uncertainty due to 
imperfect emulation, and folds them into the overall in- 
ference uncertainty. This procedure has the conceptual 
advantage of coherence and the practical advantage of 
guarding against overly optimistic parameter estimates. 
It can be easily extended to very large parameter spaces 
and to include different data sources such as CMB and 
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TABLE I: Parameter constraints (mean value ± 1 standard deviation) for six parameters from CMB TT power spectrum 
analysis alone and with the adddition of large scale structure data. The first line gives the true value of the parameters. 





n 


h 


(^8 




fib 


r 


Truth 


0.99 


0.71 


0.84 


0.27 


0.044 


0.09 


CMB only 


1.0185 ± 0.0422 


0.7527 ± 0.0544 


0.8824 ± 0.1004 


0.2387 ± 0.0592 


0.0397± 0.006 


0.1241 ± 0.0763 


CMB + LSS 


1.0090± 0.0263 


0.7335 ± 0.0371 


0.8722 ± 0.0262 


0.2560 ± 0.0387 


0.0413 ± 0.0044 


0.0922 ± 0.0434 
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FIG. 12: Posterior univariate marginal density estimates for 
the cosmological parameters taking into account the TT only 
data (white) and both the TT and the matter power spectrum 
data (orange), showing the increased precision obtainable by 
systematically including multiple sources of data. 



large scale structure measurements. 

Depending on the parameter ranges considered, our 
emulation scheme can perform at the sub-percent level 
accuracy, thereby, at least in principle, satisfying a fun- 
damental requirement for next-generation cosmological 
analysis tools. It is especially suited to - and designed 
for - problems where the underlying simulations are very 
costly and only a limited number can be performed. 
While clearly of more general utility, our framework is 
targeted to analysis of upcoming large-scale structure 
based probes of dark energy, such as weak lensing and 
baryon acoustic oscillations. Future directions and the 
usage for our framework in cosmology are manifold. We 
plan to extend the set of cosmological parameters un- 
der consideration to include the dark energy equation 
of state parameter w and to publicly release an emulator 
scheme based on very accurate, high-resolution dark mat- 
ter simulations. The framework can also be used to fit 
supernova light-curves or determine photometric galaxy 



redshifts based on training sets. Work in this direction 
is already in progress. 
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APPENDIX A: COMPARISON WITH OTHER 
METHODS 

In this Appendix we compare the performance of our 
emulator for the TT power spectrum with other recently 
introduced interpolation schemes. An early attempt is 
based on the idea of splitting the TT power spectrum 
into low-Z {I < 100) and high-Z regions {I > 100) and to 
use analytic fits to express the C;s 19]. This method 
is accurate at the 10% level and forms the basis of the 
approach underlying DASh (Davis Anisotropy Shortcut) 
developed in Ref. \2V\ . DASh relies on rapid analytic and 
semi-analytic approximations and leads to good accuracy 
(at the 2% level on average) and performance. Another 
interpolation scheme, CMBwarp [20], is based on intro- 
ducing a new set of cosmological parameters (for details 
on these parameter choices see Ref. [56]) which better re- 
fiect the underlying physics of the CMB. The degeneracy 
structure in the new parameter space is much simpler and 
therefore helps with MCMC convergence. The new pa- 
rameter set has almost-linear infiuence on the TT power 
spectrum (i.e., the spectrum moves mainly vertically or 
horizontally with parameter changes). It is therefore rel- 
atively easy to find a polynomial fit around a fiducial 
model that is reliable for different parameters. CMBwarp 
is faster than DASh at the same level of accuracy. Both 
approaches have two major drawbacks: they only lead 
to accurate results in a narrow parameter range around 
a fiducial model and incorporation of new parameters is 
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FIG. 13: Residuals for the Pico emulator of the TT power 
spectrum, tested on 64 runs. The upper plot in red shows the 
residuals for runs which are in the 3-a confidence level around 
the best-fit WMAP-I model. 22 out of the 64 runs fulfiU this 
criterion. The residuals are at the 0.5% accuracy level in this 
case. Note the slight downward trend in all cases. The lower 
plot in blue shows the remaining 42 runs which are in the 3-a 
confidence level around the best fit parameters from WMAP. 
In this case the errors are much worse and the emulator is not 
controlled. Again a slight downward trend can be seen in all 
the runs. 



very difficult since new fits have to be developed. In 
the case of CMBwarp, any additional parameter has to 
be "orthogonal" to the others, which might be difficult to 
achieve. Very recently, a new interpolation method, Pico, 
was introduced in Ref. [l^l . Pico is based on a large num- 
ber of training sets (a ~ 10^ run MCMC chain). It allows 
very accurate and very fast determination of temperature 
power spectra around the best-fit WMAP model. Pico 
is trained to compute the power spectra within several 
log likelihoods around the peak. Therefore, good perfor- 
mance away from the peak cannot be guaranteed. Inte- 
gration of new parameters is possible by generating new 
training data sets. Very similar to the Pico approach is 
CosmoNet [23]. In this approach, a neural network is 
trained on 2,000 CAMB runs. The CosmoNet results are 
very similar in accuracy and speed to Pico. 

We concentrate our comparison on two publicly avail- 
able codes: CMBwarp and Pico. Both codes are designed 
to yield reliable, accurate results within WMAP's first- 
year 3-cr confidence region around the best-fit model. The 
parameter ranges we have investigated so far are much 
broader. Therefore, in order to be able to carry out a 
meaningful comparison between the three different ap- 
proaches, we build a new emulator based on 128 CAMB 
runs in the parameter range specified in Eqn. ([M)) . (Note 
that we now also include in our variable choice for 
the dark matter and baryon content of the Universe). 
These parameter ranges are within 3-cr about the best fit 
parameters, which is still a larger range than Pico and 



FIG. 14: Results following Fig. [13] but for the CMBwarp em- 
ulator. The errors are overall an order of magnitude bigger 
than for Pico, in agreement with the findings of Ref. 



CMBwarp allow. In fact, their allowed ranges are rather 
narrow and cannot be cast in the form of a symmetric 
box around the best fit parameters. 

In addition to the 128 training runs we generate 64 
reference CAMB runs for testing our emulator as well as 
CMBwarp and Pico. In Fig. [13] we show the emulator 
quality for Pico. Only 22 out of the 64 simulations lie 
in the allowed parameter range; we display the residuals 
for these runs in the upper part of Fig. [T3] in red. The 
accuracy in this parameter range is very good, at the 
0.5% level. Somewhat worrisome though is the existence 
of a systematic trend in the Pico data: All residuals are 
systematically low for large /, possibly leading to biases 
in the parameter constraints. In the lower plot of Fig. [13] 
we show the residuals for the remaining 42 simulations 
in blue, which lie in the 3-(T interval around the best 
tit parameters but not the best tit model for WMAP-I. 
Here the errors are much bigger, by more than an order 
of magnitude. This shows that Pico should not be taken 
out of the parameter range it was trained for, and is 
not robust against extrapolation into larger parameter 
ranges (as expected for a polynomial-based fit). Fig. [H] 
shows the residuals for the CMBwarp runs. As for Pico 
we have divided them into runs which are inside the 3- 
a confidence level around the best fit model and those 
outside, but still inside the 3-cr confidence level of the best 
tit parameters. In general, the performance of CMBwarp 
is an order of magnitude worse than the performance of 
Pico, confirming the results in Ref. [22]. For / > 180, 
the fit is slightly modified, leading to a small kink at this 
value in the results for the C/s. The performance on all 
64 runs of our emulator scheme based on CP models is 
shown in Fig. [9] 50% of all runs are predicted with sub- 
percent accuracy, 90% with an accuracy between one and 
two percent. Not a single prediction is off by more than 
3%. This demonstrates impressively the stability of our 
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emulator scheme over large parameter ranges. 
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